(12) United States Patent 

Henton 



IISIEII IIIIIII lll i l IIIIIIIIH 

US006738738B2 

(io) Patent No.: US 6,738,738 B2 
(45) Date of Patent: May 18, 2004 



(54) AUTOMATED TRANSFORMATION FROM 
AMERICAN ENGLISH TO BRITISH 
ENGLISH 

(75) Inventor: Caroline G. Henton, Santa Cruz, CA 
(US) 

(73) Assignee: Tellme Networks, Inc., Mountain View, 
CA (US) 

( * ) Notice: Subject to any disclaimer, the term of this 
patent is extended or adjusted under 35 
U.S.C. 154(b) by 673 days. 

(21) Appl. No.: 09/745,371 

(22) Filed: Dec. 23, 2000 

(65) Prior Publication Data 

US 2002/0173966 Al Nov. 21, 2002 

(51) Int. CI. 7 G06F 17/28 

(52) U.S. CI 704/2; 704/227; 704/277 

(58) Field of Search 704/2-10, 277, 

704/227; 707/375 

(56) References Cited 

U.S. PATENT DOCUMENTS 

6,154,758 A * 11/2000 Chiang 715/541 

6,188,984 Bl * 2/2001 Manwaring et al 704/260 

6,493,744 Bl * 12/2002 Emens et al 709/203 

6,618,697 Bl * 9/2003 Kantrowitz et al 703/22 

OTHER PUBLICATIONS 

Humphries et al., The Use of Accent-Specific Pronunciation 
Dictionaries in Acoustic Model Training, 1998, IEEE, p. 
317-320.* 



Jeremy Smith, American British British American Dictio- 
nary, Mar. 6, 2000, Internet.* 

* cited by examiner 

Primary Examiner— Talivaldis Ivars Smits 

Assistant Examiner — Lamont M Spooner 

(74) Attorney, Agent, or Firm—Bcvei, Hoflman & Harms, 

LLP; Jeanette S. Harms 



(57) 



ABSTRACT 



A method of transforming a voice application program 
designed for US English speakers to a voice application 
program for UK English speakers using a computer system 
is described. In one embodiment, scripts and grammars 
associated with the voice application program are converted 
from US-to-UK English. The process includes spelling 
normalization, lexical normalization, and pronunciation 
conversion (including where appropriate accounting for 
stress shifts). The result is necessary word pronunciations 
for speech recognition of UK English speaker (especially for 
proper nouns) as well as a script that has been conformed to 
use UK English spelling and lexical conventions. 
Additionally, the script can be annotated with pronuncia- 
tions as a part of the process. Further, in one embodiment a 
web based interface to the conversion process is provided 
either standalone or as part of a voice application develop- 
ment environment. 

16 Claims, 1 Drawing Sheet 



US Script 
100 




UK Script 
160 



Normalize 
Spellings 
120 



UK 
Grammars 
162 




Normalize Lexical 
Differences 
130 



Perform Phoneme 
Conversions 
140 



Goldenlze 
150 



UK 

Pronunciation 
Data 




U.S. Patent 



May 18, 2004 US 6,738,738 B2 




1 



US 6,738,738 B2 



2 



AUTOMATED TRANSFORMATION FROM 
AMERICAN ENGLISH TO BRITISH 
ENGLISH 



BACKGROUND OF THE INVENTION 

1. Field of the Invention 

This invention relates to the field of phonetics. In 
particular, the invention relates to technologies for trans- 
forming pronunciations appropriate for American English 
into pronunciations appropriate for British English. 

2. Description of the Related Art 
A. Notation 

Before turning to definitions, some notational concerns 
will be addressed. A standard notational alphabet, the Inter- 
national Phonetic Alphabet (IPA) can be used to represent 
the pronunciation of words using phonemes. However, the 
IPA uses symbols that are difficult to represent easily in 
ASCII systems and further many of the symbols lack 
appropriate representational glyphs in standard computer 
fonts. (Newer systems that handle Unicode can represent 
IPA symbols directly and frequently include newer fonts 
with appropriate glyphs for IPA symbols.) Accordingly, it is 
more convenient and has become industry standard practice 
to use the Computer Phonetic Alphabet (CPA) in computer 
speech recognition and pronunciation generation tools such 
as "autopron", from Nuance Communications, Menlo Park, 
Calif, and "namepro", from E-Speech Corporation, 
Princeton, NJ. 

The CPA has the advantage that it can be represented 
using standard ASCII characters using the glyphs in com- 
monly available fonts. The following tables show the cor- 
respondence between CPA and IPA symbols for American 
English and British English. 

TABLE 1 

American English: Computer Phonetic Alphabet (CPA) to 
International Phonetic Alphabet (IRA) Correspondence 



CPA Example IPA CPA IPA CPA IPA 





Vowels 






Stops 


Fricatives 


i 


fleet 


★ 


P 


★ 


f ★ 


I 


dimple 


★ 


t 


★ 


T * 


e 


date 


★ 


k 




s ★ 


E 


bet 


★ 


b 


★ 


S * 


a 


cat 


★ 


d 


★ 


v ★ 


a i 


side 


★ 


g 




D ★ 


Oj 


toy 


★ 




Flaps 


z ★ 




cut 


★ 


! 


★ 


Z * 


u 


blue 


★ 




Nasals 


h * 


U 


book 




m 




Approximants 


0 


show 


★ 


a 


* 


j * 


0 


caught 


★ 




it 


r * 


A 


father, cot 


★ 


Affricates 


w * 


aw 


couch 


★ 


tS 


★ 


1 * 


•r 


bird 


★ 


dZ 








alive 


★ 









TABLE 2 



British English: Computer Phonetic Alphabet (CPA) to 
International Phonetic Alphabet flPA) Correspondence 



5 



15 



20 



CPA 


Example 


[PA 


CPA IPA 


CPA IPA 




Vowels 




Stops 


Fricatives 


i 


bean 


* 


P * 


f * 


I 


bin 




t * 


T * 


e 


bane 




k ir 


s it 


E 


bet 


it 


b # 


S * 


a 


bat 


it 


d * 


v ir 


A 


father 


•k 


g * 


D * 


@ 


cot 


★ 


Flaps 


z ★ 


O 


caught 


★ 


! ★ 


Z ★ 


0 


go 


★ 


Nasals 


h * 


U 


book 




m * 


Approximants 


u 


toot 


★ 


n * 


j * 




cup 




r . * 


r ★ 


3 


bird 


★ 


Affricates 


w ★ 




alive, rider 


★ 


ts * 


1 ★ 


aj 


five 


★ 


dZ ★ 




Oj 


boy 


★ 






aw 


cow 


it 






i* 


beer 


★ 






e* 


bear 


★ 






u* 


poor 


★ 







30 Throughout the remainder of this document, the CPA sym- 
bols will be used to represent phonemes in transcriptions. 
When relevant, transcriptions written in CPA symbols will 
be identified as corresponding to British English (UK) or 
American English (US) if it is not clear from the context and 

35 it is relevant to understanding the material. Additionally, to 
minimize confusion, US English conventions for spelling 
and style will be used throughout the body of this 
specification, except in examples and rules. Additionally, the 
UK CPA forms are used for Australian and New Zealand 

40 pronunciations. 

The range of possible sounds that a human being can 
produce by moving the lips, tongue, and other speech 
organs, are called phones. These sounds are generally 
grouped into logically related groups, each a phoneme. In a 

45 given language only certain sounds are distinguished (or 
distinguishable) by speakers of the language, i.e. they con- 
ceptualize them as different sounds. These distinguishable 
sounds are phonemes. In fact, a phoneme may be defined as 
a group of related phones that are regarded as the same 

50 sound by speakers. The different sounds that are part of the 
same phoneme are called allophones (or allophonic 
variants). 

Returning to notation issues, the phonemic transcription 
of a word will be shown between slashes ("/ /"). For clarity, 

55 the glyph will be placed between each phoneme in the 
transcription, e.g./k-0-r-n-*r/ for "comer" (US), to represent 
the space character visibly. In many computer programs a 
space character is used to represent the boundary between 
phonemes; however, in a printed publication using the 

60 standard glyph for the space character, " ", might lead to 
ambiguities, e.g. between /*r/ and (US), etc. 

If used, phonetic transcriptions will be shown in brackets 
("[ ]")■ Phonetic transcriptions distinguish between the dif- 
ferent phones that are allophones of the phoneme. 

65 B. Role of Phonemic Transcriptions in Speech Software 
Speech recognizers (both speaker independent and 
speaker dependent varieties) rely on pronunciations to per- 



US 6,738,738 B2 

3 4 

form recognition. For example, in order for the Nuance™ Prior techniques for converting US English to UK English 

speech recognition software from Nuance Communications, have required humans to perform textual normalization and 

to recognize a word in a recognition grammar, a pronuncia- pronunciation transformations. Accordingly, what is needed 

tion (e.g. phonemic transcription) must be available. To is a method and apparatus for automating the transforma- 

support recognition, Nuance provides a large phonemic 5 tion. 

dictionary that includes pronunciations for many American Prior techniques for representing word pronunciations in 

English words. The content of the dictionary typically ASCII characters have not supported indicating word stress, 

excludes proper nouns and made up words, e.g. "Kodak"; Accordingly, what is needed is a method and apparatus for 

however, there may be extensions for particular purposes, indicating word stress in a fashion compatible with both US 

e.g. for US equity issues (stocks). 10 and UK CPA representations as well a method and apparatus 

Additionally, Nuance provides an automated tool, for presenting a version of the pronunciations without word 

"autopron", that attempts to generate (simply from the stress to incompatible speech synthesis and recognition 

spelling of the word) a usable pronunciation. Other systems. 

companies, e.g. E-Speech, specialize in providing software Prior techniques for preparing voice application programs 
that they claim can do a better job at generating such is do not easily allow a script initially prepared for US English 
pronunciations. to be automatically converted to UK English. Accordingly, 
Symmetrically, a good pronunciation is also important to what is needed is a method and apparatus textually normal- 
producing good synthesized speech (or in the case where a izing a document from US English to UK English and for 
human is reading a script, providing the human with extra refining a US English phonemic transcription using one or 
guidance about the correct pronunciation). Thus, a useful 20 more well defined rules to produce more accurate transcrip- 
phonemic transcription is important to many aspects of lions for UK English. 

computer speech technology. ADVnrTI ir TMVrNrTTnM 

C. British English and American English SUMMARY OF THE INVENTION 

Although American English and British English share a a naive assumption might be made that US and UK 
common origin, there are significant differences in grammar 25 English are similar enough to allow a program designed for 
(word choice, vocabulary, spelling, etc.), pronunciation, and one mar ket to simply be used in the other. Topically, due to 
text normalization (e.g. time formats, data formats, etc.). i ts s i ZCj an application might be first prepared for the US 
One can typically purchase an electronic dictionary of market witn later ^ in tne UK ( anc j possibly continental 
British English, e.g. for use in spell checking, or even a Europe where the UK variety of English is used). For a voice 
phonetic one for use with products such as the Nuance 30 application, it is necessary to ensure that UK English pro- 
speech recognition system. However, such a pronunciation nunciations for all words are available to enable speech 
dictionary assumes that materials have already been pre- recognition and speech generation (both by text-to-speech 
pared in British English form. anc j human voice talent.) 

For example, given a particular word like "attorney" in a According i y , a method of transforming a voice applica- 

production script for a voice application ^(e.g. yellow pages), 35 ^ ffl desi d fof us English speakefS tQ a voice 

that was prepared for American English speakers there are lication program for UK English speakers ^ a com . 

several problems. First, if presented a list of options, attor- r m fa described Automation allows what would 

ney" will sound awkward to a British native since they otherwise be a tedious manual process t0 ^ aut0 . 

expect the term "solicitor" (or perhaps if trying to get out of mate d and focuses human intervention (when need ed) on 

gaol a "barrister"). Similarly, the native British speaker is 40 correcti mc ims 

unlikely to provide the verbal command "attorney to the ° ' . , . , 

. 3 r ... r itfM f n . H - . In one embodiment, scripts and grammars associated with 

speech recognition system. Lastly, even it the British , . . ' r ^ , c Tro . TTT , 

r , ..J * j J j W , 4 „ t . ... the voice appb. cation program are converted from US-to-UK 

speaker did provide the word "attorney , the pronunciation " . . . , u i . a ■ .u* 

•ii *c jl a • tl- i English. Three primary tasks can be completed in this 

will be different from the one used by Americans. This also & ... v \. . . . . v v A 

t . i-f.. ■ « i. process: spelling normalization, lexical normalization, and 

has an impact on the recording of the program script where 45 r oyyiuug uunuaiu, , , , 

. / . . « „ » ^A a* u pronunciation conversion (including where appropriate 

prompts for categories such as "attorneys would need to be v . r \ rr r 

A a accounung for stress shifts), 

re-recorded. * 

These problems may be further exacerbated in the realm ^ converted script can be generated using the method 

of proper nouns, e.g. names and places, as well as made up and apparatus described herein and a preferred pronuncia- 

words, e.g. company names, movie/book titles, etc., where 50 tion from such effort inscrted 1Qt0 lhe ^"P 1 m appropriate 

even if a British English dictionary were provided the term locations to assist the voice talent. For example, in one 

would not likely be present. embodiment words with stress differences from the US 

D Noting Stress in CPA English forms have their pronunciations listed in the script. 

Presently, (as seen above in Tables 1 and 2) the CPA does Similarly, the method and apparatus can be integrated into 

not support the representation of stress within a word. This 55 a remotely hosted development environment. This can allow 

limits its usefulness (as compared to IPA representations) in developers who would otherwise be unlikely to have the 

designating differences in pronunciation. For example resources and skill to convert their program independently to 

"advertisement" is pronounced in US English with the stress do so in a highly automated fashion. Additionally, any 

on the penultimate syllable of the word, whereas UK English manual intervention can be focused on answering specific 

places the stress on the second syllable of the word. Shifting 60 questions: what part of speech is this, etc. Further such 

the stress changes the pronunciation. questions are much more easily answered by a non- 

Although present generation speech recognition systems professional (in linguistics/phonetics) than the broader con- 

(e.g. Nuance) do not make use of stress (see absence of the version questions. 

same from CPA, above) the stress information is essential DESCRIPTION OF THE FIGURES 

for a voice talent performing a script and may potentially be 65 

useful in enhanced speech recognition. FIG. 1 is a process flow diagram for generating phonemic 

E. Conclusion variations. 



US 6,738 ; 

5 

DETAILED DESCRIPTION 
A. Introduction 

The techniques described herein can be applied in several 
fashions. A most basic example, a list of US English words 
(with the pronunciations) could be transformed into their 5 
respective UK English orthographic forms (spellings) and 
pronunciations. Additionally, further textual normalization 
of the list can be performed to replace words not commonly 
used in Britain with their more common British substitute, 
e.g. "hoover" (UK) for "vacuum" (US), etc. As such, if a US 10 
English pronunciation dictionary has been purchased (e.g. 
for your speech recognition system), the above approach 
(minus word replacements) can be used to generate a UK 
English pronunciation dictionary from scratch. 

However, the focus of the discussion will be on conver- 15 
sion of a program script for a voice application (and its 
associated grammars) originally prepared for US English 
into UK English. For purposes of this discussion it will be 
assumed that there is an appropriate US English pronuncia- 
tion for all words in the program script. In a preferred 20 
embodiment, the original US program script and grammars 
are handled as described in U.S. patent application Ser. No. 
09/721,373, entitled "Automated Creation of Phonemic 
Variations", filed Nov. 22, 2000, having inventor Caroline G. 
Henton. More specifically, in one embodiment, for each 25 
word in the program script and grammar there is a golden- 
ized US English pronunciation (e.g. a pronunciation adopted 
as authoritative or golden). 

It should also be noted that many of these rules could be 
sensibly applied to accomplish the reverse task: converting 30 
a UK English application to US English. However, since at 
present a larger portion of applications tend to be prepared 
first for US English and then "converted" or "internation- 
alized" for UK English, the focus will be on the US-to-UK 
transformations. 35 

The process of transforming a script and grammar from 
US-to-UK will now be described in greater detail with 
respect to FIG. 1, The process of FIG, 1 can be implemented 
using one or more computer systems with, or without, a 
direct human operator. 40 

The process starts with a US script 100 and corresponding 
US grammars 102. The script portion corresponds to those 
pieces of text to be read by human voice talents or computer 
synthesized speech. The grammars correspond to those 
pieces of text that the computer expects the user of a voice 45 
application to say at various junctures. An example for a 
simple "hello world" application may be helpful. The US 
script 100 might contain a single line script: "Hello World, 
say Menu to return to the main menu" and the grammar file 
US grammars 102 would consist of a single choice "menu" 50 
in the appropriate grammar file format. In one embodiment, 
the grammars (e.g. US grammars 102 and UK grammars 
162) are formatted according to the Nuance GSL language 
for grammars, in another embodiment, an XML grammar 
representation format is used. The words in scripts and 55 
grammars can be represented using one or more standard 
character sets (e.g. ASCII, Unicode, etc.) in one or more text 
encodings (e.g. ISO-8859, UTF-8, UTF-16, etc.). 

The process begins at step 120 with the normalization of 
spellings. As described more fully below a number of 60 
straightforward rules can be applied to bring the US spelling 
close to the common UK spelling. Additionally, ad hoc rules 
can be provided as well (e.g. a list of exceptions with the 
correct form, e.g. "yogurt" (US) to "yoghurt" (UK).) 

Next, lexical differences between the vocabularies can be 65 
adjusted at step 130. As noted above, words like "attorney" 
which are common in US English lack significant semantic 



,738 B2 

6 

meaning in UK English where the term "solicitor" is com- 
mon. A set of sample rules covering a large range of 
common uses is presented along with a general approach for 
preparing (and handling) such lists. Additional lists can 
easily be prepared for specialized areas including: financial 
terms, units of measure, musical notation, automotive parts, 
betting terms, botanical and zoological names, food names, 
slang, and cricket terms. Note: stylistic differences, e.g. in 
the presentation of time, system of measurement, etc., will 
not addressed by lexical transformations. Thus, for example 
a set of measurements delivered in imperial units (as 
opposed to metric units) will not be flagged, however the use 
of terms like "quart" which is not used in the British imperial 
system will be flagged. 

Next, at step 140 phonemic conversions to UK English 
are performed. This may (as shown in FIG. 1) result in the 
generation of the UK Pronunciation data 112. However, if a 
good source of UK pronunciation data is available the 
process may instead rely on such data. The process of step 
140, as noted, relies on US pronunciation data 110 for the 
words in the US script 100 and the US grammars 102. This 
process will be discussed in more detail in conjunction with 
several rules for determining pronunciations for UK English 
from US English. 

The final step, step 150, involves goldenizing the pronun- 
ciations (and scripts/grammars), that is selecting an 
approved transcription for use in the system. This process 
maybe automated, manual, and/or a combination of the two. 
For example, transcriptions might automatically become 
available within the pronunciation data 112 prior to gold- 
enization (e.g. after step 140); however, they could be 
flagged as such to avoid their use in secondary purposes, e.g. 
in the recording of a script. In the case of multiple tran- 
scriptions for a single word, goldenization may include 
selecting the golden, i.e. preferred, pronunciation for a word. 
The goldenized pronunciation is the one that should be used 
by automatic Text-to-Speech (TTS) processes and by human 
voice talents in reading scripts that contain words in the 
scripts (or other words for which pronunciations have been 
generated). Additional, variant, pronunciations remain use- 
ful for speech recognition purposes if they represent the 
common variant pronunciations for a word. Although gold- 
enization is primarily discussed in the context of approving 
transcriptions, the UK script 160 (and UK grammars 162) 
outputs will also typically reviewed. In some embodiments, 
the step 130 may have flagged certain portions of the script 
or grammar for manual review (e.g. no easily determined 
semantic equivalent). 

It should be noted that sometimes there is no substitute to 
having a human talk to a business establishment, or a local 
of a particular area, to determine the locally used pronun- 
ciation. For example, in Mountain View, Calif., there is a 
restaurant called "Vivaca" and none of the automated (or 
initial human) efforts to create the appropriate transcription 
were successful (due to the odd pronunciation the proprietor 
and locals use — that does not correspond to the apparent 
origin of the word). Thus, although the automated processes 
of step 120-150 significantly reduce the costs and likelihood 
of error in preparing a program for use in UK locales, some 
human oversight will always remain prudent. 
B. Spelling Normalization 

The following describes several spelling rules useful for 
application in conjunction with step 120. In some instances 
the rules may indicate a part of speech, in those instances the 
script/grammar may be flagged for manual treatment later. In 
another embodiment, exceptions of this type are manually 
reviewed by a human operator either as they arise or in 



US 6,7; 

7 

batch. These embodiments allow step 130 to receive ver- 
sions of the script with normalized orthography according to 
British English. In another embodiment, all errors from steps 
120 and step 130 are handled prior to step 140. In another 
embodiment, the errors are only handled after step 140, but 
the process of FIG. 1 can be repeated in such a fashion as to 
allow the appropriate handling of just the necessary portions 
of the US script/grammars. 



1,738 B2 

8 

The following table lists regular spelling differences. The 
affected letters are noted in boldface with word-medial and 
word-final positions indicated with an elliptical hypen, 
e.g. — ae- and -re. Additionally, an upper case letter "C" 
represents any orthographic consonant. 



TABLE 3 

Consistent Spelling Differences 
US UK 



# Spell. Examples 

1 -II enthrall, fulfill, instill, fulfillment, 

installment, skillful 

2 -el beveled, canceled, caroled, channeled, 

chiseled, counseled, crystaled, cudgeled, 
dialed, disheveled, doweling, dueled, 
emboweled, enameled, tunneled, 
gamboled, graveled, groveled, imperiled, 
labeled, leveled, jeweled, libeled, 
marshaled, marveled, medaled, modeled, 
paneled, patroled, penciled, petaled, 
pommeled, propcled, quarreled, reveled, 
rivaled, shoveled, shriveled, signaled, 
sniveled, stenciled, totaled, toweled, 
trammeled, traveled, tunneled 



3 -er caliber, center, centimeter, fiber, 

kilometer, liter, meter, miter, niter, 
ocher, reconnotter, saber, sepulcher, 
specter, theater 

4 -c- anesthetic, archeology, cesarean, 

dieresis, encyclopedia, etiology, esthete, 
feces, hemorrhage, medieval, peon 

5 -e- ameba, diarrhea, edema, esophagus, 

fetus, maneuver, p he nix 

6 -g analog, catalog, demagog, dialog, 

homolog, monolog, pedagog, travelog 

7 -ize analyze, apologize, colorize, galvanize, 

localize, metalize, pulverize, recognize, 
summarize, televize, tranquil ize 

8 -C amid, among, while 

9 -g- fagot (bundle of sticks), wagon 

10 -m gram, program 

11 -se defense, license, offense, practise, 

pretense (oil when used as nouns) 

12 -o- mold, molt, smolder 

13 -or arbor, armor, behavior, candor, clamor, 

color, demeanor, en amor, favor, flavor, 
humor, neighbor, odor, parlor, savior, 
valor, vapor 

14 In- incase, inclose, indorse, inquire, insure, 

inure 

15 -dg- abridgment, acknowledgment, judgment 

16 -ctioo connection, deflection, inflection, 

retrofleclion 



Spell. Examples 

-I enthral, fulfil, instil, fulfilment, 
instalment, skilful 

-ell bevelled, cancelled, carolled, 

channelled, chiselled, counselled, 
crystalled, cudgelled, dialled, 
dishevelled, dowel ling, duelled, 
embowelled, enamelled, funnelled, 
gambolled, gravelled, grovelled, 
imperilled, labelled, levelled, jewelled, 
libelled, marshalled, marvelled, 
medalled, modelled, panelled, patrolled, 
pencilled, petal led, pommelled, 
propelled, quarrelled, revelled, rivalled, 
shovelled, shrivelled, signalled, 
snivelled, stencilled, totalled, towelled, 
trammelled, travelled, tunnelled 

-re calibre, centre, centimetre, fibre, 
kilometre, litre, metre, mitre, nitre, 
ochre, reconnoitre, sabre, sepulchre, 
spectre, theatre 

-ac- anaesthetic, archaeology, caesarean, 
diaeresis, encyclopaedia, aetiology, 
aesthete, faeces, haemorrhage, 
mediaeval, paeon 

-oe- amoeba, diarrhoea, oedema, 

oesophagus, foetus, manoeuvre, 
phoenix 

-gue analogue, catalogue, demagogue, 
dialogue, homologue, monologue, 
pedagogue, travelogue 

-ise analyse, apologise, colorise, galvanise, 
localise, metalise, pulverise, recognise, 
summarise, televise, tranquil I ise 

-Cst amidst, amongst, whilst 

-gg- faggot, waggon 

•mme gramme, programme 

-ce defence, licence, offence, practice, 
pretence 

-ou- mould, moult, smoulder 

-our arbour, armour, behaviour, candour, 
clamour, colour, demeanour, enamour, 
favour, flavour, humour, neighbour, 
odour, parlour, saviour, valour, vapour 

en- encase, enclose, endorse, enquire, 
ensure, enure 

-dge- abridgement, acknowledgement, 
judgement 

•xfoD connexion, deflexion, inflexion, 
retroflexion 



Comments on Table 3: Rules 4 and 5 indicate the UK preference for, and re ten- 
Rule 1: Note that even when a word appears in additional 60 tion of, the original Greek spelling, with or without the use 
forms, e.g. "install" appearing as "installment", the spelling of digraphs. These rules will impact especially the alpha - 
change occurs and the UK spelling is "instalment". betical ordering of entries that have initial ae- or initial oe- in 
Rule 2 is exemplified using the past participle, but it also UK English. This rule may be difficult to apply in automated 
applies to several other inflections of the base morpheme, fashion, in some embodiments, word etymologies are con- 
+ing, +ous, +er, thus: marveling (US)-*marvelling (UK); 65 suited for the source words in the US script/grammar to 
marvelous-»marvellous; marveled-»marvelled; determine whether these rules should be applied. In other 
marveler-*marveller. embodiments, they are flagged for manual review. 



us 6,7: 

9 

Also, rule 5 demonstrates that sometimes more than one 
rule may apply, e.g. "maneuver" (US) becomes "manoeu- 
vre" (UK) since both rule 3 and rule 5 apply. 

Rule 7 shows a continued preference for the -ise spelling 
in UK English dictionaries, but -ize is appearing progres- 
sively more frequently as a variant in UK publications and 
press. Contemporary dictionary entries now list -ize as an 
alternative. In uncertain cases, it might be useful to employ 
the Canadian system, which has -ize when the root is 
transparent (e.g. "capitalize" and "glamorize"), and -ise 
when the stem is opaque (e.g. "apologise" and "realise"). 

Rule 8 applies to a very few words, and the different 
spellings are reflected in the pronunciations. 

Rule 9 applies to a very few words. Oddly, US English 
retained the double-g in the slang word "faggot" used to 
refer (oft times in a derogatory fashion) to homosexuals. 

Rule 10 applies to the metric units: milligramme, 
centigramme, gramme, and kilogramme, as well as words 
like programme. The rule also applies to their respective 
derived forms. Oddly, the rule does not apply to the word 
"telegram" which is the same in US and UK English. 

Rule 11 serves in UK English to differentiate orthographi- 
cally the noun -form from the verb -form in pairs of words 
which in US English have become nomographic. Because 
rule 11 requires identification of the part of speech, verb/ 
noun, it may be necessary to manually review the script to 
identify the appropriate form. In some embodiments, the 
script is flagged to indicate the need for manual review at 
that point in the script/grammar but the US English form is 
left in place. 

Rule 12 operates on a restricted set of words, c.f. "boul- 
der" and "shoulder" where US English retained the UK 
English spelling. 

Rule 13 should be applied with care so as not to replace 
the agentive marker -or in names or places, e.g. "actor", 
"governor", "Bangor". 

Rule 14 is not universal in its application. Some words 
like "inquire" are used in both varieties; and "envelope" and 
"incur" do not have variants. 

Rule 16 "inspection" and "complexion" are standard in 
both UK and US varieties. 

Additionally a number of ad hoc spelling differences can 
be noted and corrected: 



US Spelling 


UK spelling 


aoommodations* 


accommodation 


aluminum 


aluminium 


all right (adv.)t 


alright 


appal 


appall 


busses or buses 


buses 


catsup 


ketchup 


check 


cheque 


chilli 


chili (con came) 


curb 


kerb 


czar 


tsar 


gage or gauge 


gauge 


hiccup 


hiccough 


gray 


grey 


jail 


gaol 


jewelry 


jewellery 


kidnaped 


kidnapped 


licorice 


liquorice 


likable 


likeable 


mustache 


moustache 


pajamas 


pyjamas 


plow 


plough 


skeptic 


sceptic 


story 


storey (buUding) 



1,738 B2 

10 



5 



-continued 


US Spelling 


UK spelling 


sulfur 


sulphur 


tire 


tyre 


veranda 


verandah 


whisky 


whiskey 


woolen, wooly 


woollen, woolly 


worshiping 


worshipping 


yogurt 


yoghurt 



The (second) variant of "acommodations", with one "c" 
for US English is not listed in UK English dictionaries, and 

15 is regarded as wrong. Notice also that US English uses a 
(unjustified by comparison with other Indo-European 
languages) plural form for what is normally classified as an 
uncountable noun. 

2Q With respect to "all right" (US), the American Heritage 
Dictionary of the English Language (1981) states that 
"alright" is a "common misspelling" for the adverbial form 
"all right"; whereas UK English proscribes "all right" except 
as a pronoun+adjective string. 

25 This concludes the focus on orthographic changes to 
words and we turn now to lexical normalization. 
C. Lexical Normalization 

Lexical normalization accounts for the divergence in 

30 vocabularies between US and UK English. These diver- 
gences are believed to have occurred for four primary 
reasons: 

1. Necessity for expansions in US English to describe new 
objects and experiences, achieved either by adaptation of 

35 existing UK English words, or neologisms. 

2. Technological and cultural developments causing 
diversion in: food items, terms for car parts, sports, and 
educational institutions 

40 3. Borrowing from different language sources, e.g. Ameri- 
can Indian, African languages, (South American) Spanish, 
and Yiddish for US English. 

4. Independent linguistic change within each variety, 
whereby some archaisms are preserved or given new 

45 meanings, or, conversely, are lost. 

See TRUDGILL, R and HANNAH, J. (1994) International 
English: a Guide to Varieties of Standard English. London, 
Edward Arnold, pp. 87-89. 

50 Other strong influences may have included the puritanical 
conservative influence of "WASPs" (white Anglo-Saxon 
protestants), who have promoted the use of euphemism in 
US English, to avoid reference to alcohol, sexual connota- 
tions and bodily functions. For example, see the entries 

55 below for "rooster", "beverage", and "washroom". 

As noted previously, the following list is not exhaustive, 
specialized lists focusing on financial terms, units of 
measure, musical notation, automotive terms, betting terms, 
botanical and zoological terms, food names, slang, cricket, 
and traffic can be obtained in fairly comprehensive form 
from sources such as SCHUR, N. W. (1987) British English 
A to Zed. New York, Harper Collins, Appendix II. 

Items marked with an asterisk ("*") are probable addi- 

65 tions to (or aliases from) an existing US dictionary. Where 
a dash appears ("-") it indicates that there is no cultural/ 
semantic equivalent in the two linguistic societies. 



US 6,738,738 B2 



11 



12 



•continued 



US version 



UK version 



US version 



UK version 



academician 

acclimate 

ad 

adjuster 

housing project 

antenna 

apartment 

attorney 

automobile 

baby buggy 
ballpoint 
baloney, bologna 
bangs 
barrette 

bathe (v. trans.) 

bathroom 

beets 

beverage 

big rig 

biscuit 

bookstore 
braid 
bread box 
broil (v.) 
buffet 

calendar 

candidacy 

candy 

centennial 

check 

checkers 

cigarettes 

chips 

closet 

club soda 

collectible 

(ice-cream, etc.) cone 
cotton candy 
cord 

counter-clockwise 
court sessions 
cracker 
crosswalk 
decal 

derby (hat) 

diaper 

disoriented 

drapes 

drugstore 

eggplant 

elevator 

endive 

fall (n.) 

faucet 

fire-truck 

flashlight 

French fries 

funeral home 

garbage can, trashcan 

gaiter 

gasoline 

German shepherd 

gotten (past part.) 

government employee 

freeway 

highway 

thruway 

expressway 

hamburger (meat) 
hardware store 
hood 

(life) insurance 



a-road 

academic* 

acclimatise* 

advert* 

assessor* 

council house 

aerial 

flat* 

solicitor* 
car 

B-road 

pram (perambulator)* 
biro* 

fringe* 

hairclip* 

bath (v. trans.)* 

loo 

beetroot* 
drink 

juggernaut/eighteen wheeler 

scone* 

blackspot 

bookshop, bookseller 

plait* 

bread bin 

grill (v.) 

sideboard* 

butty 

diary 

candidature* 
sweets* 
centenary" 
bill 

draughts" 

fags 

crisps" 

cupboard*, wardrobe* 
soda (water) 
collectable*, etc. 
cornet 
candyfloss* 
lead (lliyd) 
anti-clockwise* 
assizes* 
cheese biscuit 
zebra crossing 
transfer (n.) 
bowler (hat)* 
nappy* 

disorientated", etc. 
curtains" 

aubergine* 
lift* 

chicory* 
autunm* 
tap" 

fire-engine* 

torch" 

chips* 

funeral parlour* 

dustbin* 

suspender* 

petrol" 

Alsation" 

got (past paxt.) 

civil servant* 



motorway* 
mince* 

ironmonger's* 
bonnet (of car)* 
(life) assurance 



15 



20 



25 



30 



35 



40 



45 



50 



55 



60 



65 



jackhammer 

jello 

jelly 

kerosene 

kitty corner (to) 

kleenex (used generically) 



10 — 



leash 

liquor store 

mail (n. and v.) 

mailbox 

math 

molasses 

mortician 

movie house/theater 

(the) movies 

muffler 

napkin 

noon 

outlet/socket 

overalls 

pants 

pantyhose 

parakeet (small) 

parking lot 

pedestrian crossing 

pharmacy 

(couch) pillow 

pillow-sham 

pin 

pitcher 

plaid 

prom 

(nail) polish 

pudding 

pull man car 

purse 

realtor 

realty 

reformatory 

restroom, washroom 

retainers (dental) 

resume 

rooster 

rutabaga 

school 

sedan (car) 

(shopping) cart 

(sales) clerk 

sheers 

sidewalk 

slick 

sneakers 

soda 

sorbet 

specialty 

squash (vegetable) 
sports 

station wagon 

slick- shift 

storage room 

store 

stove 

subway 

suspenders 

sweater 

swimsuit 

taffy 

t.v. 

telephone booth 
tow truck 
tractor trailer 
traffic post 

trailer/camper/mobile home 
trailer truck, tractor- trailer 
transportation (n.) 



pneumatic drill* 

jelly 

jam 

paraffin* 

diagonally opposite 

tissue 

lay-by 

lead 

off-licence* 
post 

pillar-box* 

maths* 

treacle* 

undertaker* 

cinema 

(the) pictures 

silencer (car) 

serviette 

midday 

power point* 

dungarees* 

trousers* 

tights 

budgerigar* 
car-park* 
zebra crossing 
chemist's (shop) 
cushion* 

brooch* 
jug 

tartan* 

(nail) varnish 

custard 

sleeping car 

handbag* 

estate agent* 

property/estate 

borstal* 

toilet 

braces 

c.v. (curriculum vitae) 

cock(erel)* 

swede* 

university/college 

saloon (car) 

(shopping) trolley" 

(sales) assistant 

net curtains* 

pavement 

slippery 

trainers* 

fizzy drink/pop 

sherbet* 

speciality" 

marrow (vegetable) 

athletics 

estate car* 

manual 

box room* 

shop 

cooker* 

underground railway* 

braces 

jumper 

bathing costume* 
toffee* 
tele- 
telephone kiosk */box 
breakdown van* 
juggernaut 
bollard* 
caravan* 

articulated lorry* (alt artic) 
transport (n.)* 



us 6; 

13 



-continued 



US version 


UK version 


trash, garbage 


rubbish" 


trial lawyer 


barrister* 


truck 


lorry* 


trunk 


boot (of car)" 


tuxedo 


dinner jacket* 


undershirt 


vest 


underpants (women's) 


knickers* 


vacation 


holiday(s)* 


vacuum-cleaner 


hoover (n. & v.)* 


vest 


waistcoat 


windshield 


windscreen* 


wrench 


spanner 


xerox 


photocopy* 


yard 


garden 


zipper 


zip" 


zucchini 


courgettes* 



With respect to the word "beverage", the usage in US 
English is assumed to have origins in puritanical and/or 
prohibitionist euphemism, where the standard English term 
for liquid refreshment, "drink", became tainted with the 
implied meaning "with alcohol". In the UK and Australia, 
the term "beverage" is considered archaic, and is more likely 
to refer to hot chocolate, tea, and soft drinks. See also, 
Trudgill at pp. 87-93 (discussing origins and examples of 
lexical items with no correspondences, or that have different, 
or additional meanings in the two varieties.) 

The range of tables of lexical equivalents provided can be 
determined based on the problem domain. For example, a 
stock trading phone application prepared in the US should 
be converted by the process of FIGURE when the lexical 
tables (of the sort above) for financial terms that are different 
between US and UK English are available to the computer 
program. 

When parts of speech are noted, the script/grammar/ 
words can be flagged for manual review. Also, when there is 
additional context information necessary to make the lexical 
determination, e.g. for underpants (women's)-*knickers, the 
word can be flagged for manual review. 

We now turn to pronunciations and conversion phonemic 
transcriptions from US to UK English. 
D. Phonemic Conversion 
1. Vowels 

A general post-vocalic r-dropping rule is required to 
convert from US to UK English pronunciations. This rule 
applies to monophthongs, diphthongs and triphthongs. In 
many instances, the orthographic spelling of a vowel fol- 
lowed by the letter V is represented in UK English by the 
vowel schwa (/*/ in CPA) alone. In other cases, there is a 
complete change of vowel quality. In the list below any 
vowel is represented with V, and the symbol "#" indicates a 
syllable boundary (# is not used in CPA; however it is shown 
here to reflect the different vocalic qualities of the word). 
Word-medial and word- final positions are indicated with an 
elliptical hyphen (-). Where no hyphen appears, this means 
the pronunciation change occurs in all lexical positions. 



:,738 B2 

14 



-continued 



15 



20 



25 



30 



35 



US CPA 


UK CPA 


Example Words 


At 


A 


hard, bar, car 


aw*r 


aw* 


hour, bower, cower 


ajr 


aj-* 


hire, lyre, liar, choir 


-aj-r 


-" or -i* 


Shropshire, etc. (N.B. speaker style dependent, 






may use both variations with -* preferred.) 


Or 


0 


hoard, lord, chord 


U-r 


u* 


boor, dour, lure, poor, you're 






heard, bird, fur, for, word, worker, hitter, butter, 






mother, worker 


Et 


a '. r 


harrow, barrel, carol 


-*r- 


- T- 


borough, burrow, concurrent, currency, currant, 






courage, Durham, flourish, furrow, furrier, hurry, 






hurricane, murrain, nourish, occurrence, scurry, 






slurry, surrogate, thorough, turret, worry 


-*r- 


m 

-U T- 


courier, mercury, obdurate, tournament, 






fniirmniipl 
UJUILUU UCI 


-O-r 


-A 




-U*r 




amateur, chauffeur, connoisseur, investiture, 






literature, masseur 


-U*r 


-0- 


your, yours, you're, yourself, yourselves 






(N.B. "-our" spelling) 






Other Vowel Shifts 


a 


A 


after, aghast, Alexander, can't, half, khaki, laugh, 






last 


-a- 


-e- 


data, datum, patent (idea and leather), Graham 






(N.B. 2 syllables in UK English), scabrous, status 


A 


O 


although, altar, bauxite, caulk, awful, etc. 






(N.B. pattern in orthography is "al", "au", "aw" 






in same syllabale) 


A 


@ 


abolish, aquatic, dog, hot, off, etc. (N.B. pattern 






in orthography is "CoC" or "CaC", where C is an 






orthographic consonant.) 


-A- 


-0- 


codicil, codify, docile, process, progress, troth 


-aj- 


-i- 


albino, anti-, iodine, labyrinthine, migraine, 






semi-, strychnine, clientele, endive (N.B. anti- 






rule does not apply to antidote (US) because 






anti- is not a prefix.) 


-aj- 


-I- 


dynasty, privacy, simultaneous, -eity (N.B. 






distinguishing from above rule may require case 






by case review.) 


-i- 


-E- 


CTetin, depot, ecumenical, egocentric, egotistic, 






etc., Oedipus, Petrarch 


-i- 


-aj- 


either, neither, carbine, elephantine, philistine, 






saline, serpentine, mercantile (N.B. distinguishing 






from above rule may require case by case 






review.) 


E 


i 


-centenary, epoch, esthete, (d)evolution, febrile, 






hygienic, lever, methane, predecessor 


E- 


I- 


esquire, erotic, expletive 


-e 


-I 


Monday, Tbesday, etc., always, holiday 


e 


-i- 


beta, lingerie, theta 


-e- 


-A- 


charade, esplanade, gala, promenade, stratum, 






tomato 


-a- 


-e- 


apparatus, apricot, compatriot, comrade, paleo-, 






prefix, patriarchal, -otic, -otism, patronage, -ise, 






phalanx 


-o- 


-@- 


baroque, compost, coquetry, coquette, dolorous, 






polka, produce (noun), protege, provost, scone, 






shone, sloth, sol-fa, sojourn, troll, Van Gogh, 






yoghurt 


-u- 


-U- 


room, broom 


-U- 


-@- 


grovel, hovel, hover, hovercraft 



US CPA UK CPA Example Words 



It 
E-r 



I#r 



R- Propping (Vr -> Vv, V or *) 

here, hear, peer 
mirror, pirouette, spirit 
hair, bear, care 



60 

2. Consonants and Yod-Dropping 

Yod -dropping after coronals in US English is one of the 
65 most obvious differences that affects the pronunciation of a 
whole group of consonant phonemes: /t/, /d/, /n/, /l/, /s/, /z/, 
/S/, /TV, and /Z/, as more fully reflected below: 



US 6,738,738 B2 



15 



16 



-continued 



US CPA UK CPA Examples 



d-j 



n-j 



T 
tS 



z-j 



s-tj or s-t-l 



i or z I 



-duct affix, conduit, credulity, deuced, 
dew, dual, ducal, due, -due, dule, duenna, 
duet, dune, duo-, dupe, -dup-, dur-, duty, 
endure, indubitable, irreducible, mildew, 
obdurate 

curlew, dilution, prelude 
anew, annuity, avenue, continuity, denude, 
diminution, diminutive, enumerate, ingenuity, 
inure, manure, minute (adj.), -new-, nubile, 
neur-, neuter, neutr-, newel, nuance, nucl-, 
nud-, nuisance, numer-, numis-, nutrition, 
parvenu, penurious, pneu-, revenue 
capsule, consulate, consume, consummate, 
insular, insuperable, marsupial, peninsula(r), 
pharmaceutical, pseudo-, -sume suffix 
cynosure 

exuberance, exude, presume 
-tud-, -tun-, -tup-, -tute, -tution affixes; tuber-, 
turn-, -tut- prefixes and infix, angostura, 
centurion, costume, futurity, impromptu, 
intuition, obtuse, perpetuity, petunia, 
pituitary, quintuplets, stew, steward, stu-, tuba, 
tube, tubular, T\idor, Tuesday, tuition, tulip, 
tulle, tumid, tuna, tunic, tureen 
enthuse, -iasm, etc., thews 
bastion, bestial, celestial (N.B. speaker style 
dependent, can list both as variations with 
/s-t-j/ as preferred.) 

brazier, casuist, crosier (N.B. speaker style 
dependent, can list both as variations with 
/z-j/ as preferred.) 



These rules can be fairly widely applied with two notable 
counter-examples: "coupon" and "erudite" which in US CPA 
are transcribed: /k-j u-p-A n/ and /E-r j-^d-aj-t/ respectively. 
The first might be regarded as a form of hyper-correction, 
whereby an unmotivated yod is introduced after a velar in a 
(mistaken) attempt perhaps to mimic French. Since no other 
spellings involving the string "-cou-" are pronounced /k j/, 
this remains a spelling-to-sound singleton. The second may 
be a simple spelling (mispronunciation. 
3. Flaps 

The well-known, dialectally distinguishing realization of 
orthographic "d" and "I" as a flap /!/ in US English does not 
apply in UK English as shown below: 



10 



15 



20 



US CPA 


UK CPA 


Examples 






variations with is! preferred.) 


z 


S 


Asia, (-)version, cashmere, coercion, dispersion, 
excursion, immersion, incursion, Persia. 






Afrikaans asthma naus -ea -eate, -eous, etc.. 
exclusive, spouse 


z 


z 


ambrosia, amnesia, anaesthesia, euthanasia, 
glazier, hosier, hosiery, Indonesia, osier, Parisian 


D 


T 


booth, baths, cloths, earthen, moths, 
notwithstanding 


t 


tS 


immature, maturity 


tS 


W 


importunate (v. and n.), petulance, posthumous, 
prefecture, pustule, spatula, spirituous 


tS 


dZ 


Norwich, Greenwich, spinach (N.B. spelling 
"ch" vs. above.) 


dZ 


d-j 


-uous (suffix), cordial, deduce, fraudulent, 




glandular, incredulous, module, nodule, ordure, 
pendulous, pendulum, residual 




ks 


eczema, exhortation 



5. Reduction of Unstressed Syllables 

The following rules require syllabification of the under- 
lying words. The syllabification can occur manually or 
25 automatically. In one embodiment, the US pronunciation 
data 110 includes syllabification information. In another 
embodiment, words requiring syllabification for processing 
are identified for manual review during processing. 



30 



US CPA UK CPA Examples 



-"•z-e-S-'-n -aj-z-e-S-' 



35 



-Ori 



40 



•b-r-o 



-etlv 



45 



-i-j'E'pi 



-b-r* 
-*-t-H 
-••ri 



US CPA 


UK CPA 


Examples 


■c!Iv 


-•■tlv 


t 


t or d 


better, hotter, martyr, body, odder, tardy 


-Ur 





50 -o- 



The phonological rule that inserts a flap for "d7"t" in US 
English may be expressed as: d, t-*!/lV(C)_(C)V. UK 
English preserves the (underlying) Id} and A/ (and Inl) in 
these environments. 
4. Fricatives and Clusters 

Some differences revolve around the set of fricatives and 
affricates: 



55 



US CPA UK CPA Examples 



*0) 



berserk, blouse, diagnose, diesel, erase, -ese 
(suffix), exacerbate, fuselage, houses, Leslie, 
mimosa, parse, talisman, valise, vase 
issue, tissue, glacier, hessian, liquorice (N.B. 
speaker style dependent, can list both as 



60 



65 



-ajl 



centralization, characterization, 
demoralization, fertilization, generalization, 
localization, immobilization, immunization, 
improvisation 

capillary, culinary, anniversary, 
contemporary, and suffixes: -berry, -ary, and 
-ery. 

conservatory, conciliatory, depilatory, 
depository, exclamatory, extemporary, 
inflammatory, inventory, laboratory, 
lavatory, mandatory, signatory 
Edinburgh, Lougliborough, Scarborough, 
etc. 

authoritative, communicative, deliberative, 
generative, imaginative, imitative, legislative 
auxiliary, aviary, beneficiary, judiciary, 
pecuniary 

contemplative, meditative, operative, 

palliative (N.B. that flap rule is also being 

applied here) 

miniature, temperature 

alimony, antimony, ceremony, matrimony, 

patrimony, obedient, obey, pomade, 

thorough, volition 

candidate, vacation 

chartreuse, masseuse 

baboon, -man, chagrin, circumstance, 

papoose, saucepan 

Amazon (the rainforest), automaton, 

biathlon, capon, hexagon, eta, lexicon, 

marathon, triathlon, etc., occult, 

pantechnicon, pantheon, paragon, pentagon, 

phenomenon, pylon, python, silicon, 

tarragon, wainscot 

borough, brimstone, brocade, chromatic, 
Olympic, probation, proclaim, procure, 
profane, profound, prohibit, proliferate, 
prosaic, thorough 

contractile, docile, domicile, facile, fissile, 
fragile, futile, hostile, infantile, missile, 
(im)mobile, nubile, projectile, puerile, 
reptile, sensile, servile, sterile, tactile, 



US 6,738,738 B2 



17 



18 



-continued 



us CPA 



UK CPA Examples 



tensile, versatile, virile, volatile 



6. Miscellaneous Pronunciation Differences 

In one embodiment of the invention, the CPA is extended 
to support representations for primary (and optionally 
secondary) word stress. The extension should be compatible 
with the underlying goals of the CPA: capable of represen- 
tation with lower ASCII characters (0-127), not interfere 
with symbols already in use in the CPA (especially in US or 
UK English CPA, but also in CPA representations used for 
other languages), and not conflict with common notations 
used in phonetics. Accordingly, in one embodiment the 
characters "1" and "2" which are easily represented in lower 
ASCII characters have been selected to represent primary 
and secondary word stress, respectively. Compare with IPA 
symbols ['] and [,], respectively. 

In one embodiment, the US pronunciation data 110 
includes these stress characters in the augmented CPA 
representation. In another embodiment, the stress characters 
("1" and "2") are removed in an automated fashion before 
pronunciations are provided to the Nuance speech recog- 
nizer version 7 to prevent the characters from causing 
erroneous errors. In another embodiment, the speech recog- 
nizer software is augmented to recognize the stress charac- 
ters and use them appropriately in performing recognition. 
In another embodiment, the text-to-speech software is modi- 
fied to make use of the stress characters. In another 
embodiment, the stress characters are provided for the 
benefit of human voice talent performing a script. The stress 
characters may enable the human voice talent to read 
unfamiliar words (or even familiar words with a different 
pronunciation in UK English) properly. 

In one preferred embodiment, human voice talent parts 
are prepared with the expectation that they will be read by 
a native speaker of the respective variety of English. 
Accordingly, the UK script 160 would typically only show 
pronunciations for words with which the native speaker 
would be unfamiliar (e.g. outside a reference dictionary for 
British English). The script can be augmented by showing 
the correct pronunciation inline or below the respective line. 
In some embodiments, the UK script 160 may be being 
produced by a non -native producer/director and accordingly 
additional pronunciations may be provided in a director's 
version of the UK script 160 than in the version provided to 
the voice talent. In one embodiment, a fully pronounced 
version of the UK script 160 can be obtained where each line 
includes a full phonetic pronunciation as this may be of 
particular aid to a non-native director or in the case where a 
non-native speaker is asked to read a particular script. 

The following lists several miscellaneous differences 
between US and UK English in pronunciation and the 
respective stress shifts when appropriate. ("1" shown in bold 
font to reduce possible confusion with capital "I" and 
lowercase "1" font glyphs) 



US-CPA 



UK-CPA 



Word 



lc-d-A-lf 

lal-m-**n-*r 

la-m-p-i-r 



la-d-@-l-f 
lA-m-*-n* 
la-m-p*e* 



Adolph 
almoner 
ampere 



-continued 


US -CPA 


UK-CPA 


Word 


5 e-t 


E-t 


ate (Note 1) 


a-!-*-lS-c 


Ut-a-S-e 


attache 


b-Ml-e 


lba-le 


ballet 


lbarldZ 


Iba-rA-Z 


barrage 


lb-e-z-M 


lb-a-z-M 


basil (herb and first 
name) 


10 lb-U-g-i 


lb-o-g-i 


bogey 


b-r'-lz-i-r 


lb-razi- 


brassiere 


lb-ritSIz 


lb-rltSI-z 


breeches 


lb-r-A-s-k 


lb-ru-s-k 


brusque 


lb-u-i- 


Ib-Oj 


buoy, -ancy. -ant 


k*lfe 


lkaf-e 


cafe 


j5 lk-an! *'l op 


lk-a-nt'-l-u-p 


cantaloupe 


lk-A-rnr*'1 


lkar'-m-El 


caramel 


lkAT-b-'T-e-!-*r 


k-Ab'IrEt- 


carburettor 


lk-a-r-MA-n 


k * lr l l j *-n 


carillon 


ltSa*z-j-*b-*-l 


USa-z-j-U-b-*-l 


chasuble 


silked" 


sl-lk-A-d* 


cicada 


lklab'rd 
20 lk-l-Vk 


lklapbOd 


clapboard 


lk-1-A-k 


clerk 


kl-Ojz*ln-e 


k-l-w-A-lz-^n-e 


cloisonne 


lk*m-M 


k-'-n-M 


colonel 


k-*-m-lp'A'Z'I-t 


lk-@-mp-*-z-ajt 


composite (v.) 


lkAn-B-'k-w-Ens 


lk@-nslk-w*-n*s 


consequence 


lk-A-n-s-'-m-e 


k-**n-ls-@-nve 


consomme 


25 lk-A-n-s-t-'-b-M 


lk- " •n-s-f-b-M 


constable 


kOrln-E-t 


lkO-n-I-t 


comet 


lk-OrME-ri 


k-*-lr-@-**ri 


corollary 


lk-A-z-nvo-s 


lk-<5?-zm@-s 


cosmos 


lk-ov*r-t 


lk- *v*-t 


covert 


lk-j-u-p-A-n 


lk-u-p-A-n 


coupon 


30 lk-Or-M-Z'*'n 


kOt-I-lzan 


courtesan 


lk-aj-o*!-! 


lk-Oj-ot-i 


coyote 


lk'U*k*U' 


lk-U-k-u 


cuckoo 


ld-A-k-s-h-U-n-d 


ld-ak-s-*n-d 


dachshund 


ldalj* 


ld-cll-* 


dahlia 


ld-'-rb-i 


ld-A-b-i 


derby 


« Ml i t 


ell-it 


elite 


lE-r* 


l-i-'-r" 


era 


lEr 


13 


err 


UreS'r 


MreZ* 


erasure 


lEskwaj-*r 


I-ls'k-waj-* 


esquire 


f-lkir 


lf-e-k-i-* 


fakir 


lf-I-ei*r 
40 8J 


lfl-g* 


figure, figurative, 
etc. 


f-in*nlsl-r 


f-aj-ln-a-n-si-" 


financier 


1 flutist 


If 1 O-tlst 


flautist 


lf-O-r-h-E-d 


If-© r I-d 


forehead 


UOrt 


f-O-rlt-e 


forte 


lfOj'r 


lFOje 


foyer 


45 itVe-k-*-s 


lf-ra-k-A 


fracas 


lfyU-r*r 


f-j-ulrO-r-i 


furore 


g-Mr-A-z 


IgarldZ 


garage 


dZ---ll-a-!-n-*-s 


dZ-"*ll-a-t-I-n-*-s 


gelatinous (Note 2) 


lglu-!-n-*-s 


lg-Wwln-'s 


glutinous (Note 2) 


lg-u*sb-ETi 


lg-U-z-b-ri- 


gooseberry 


50 lg-O rd 


lg-u*-d 


gourd 


lg-r-I-m-Ts 


g-pMm-e-s 


grimace (v., n.) 




lh-A*ri-m 


harem 


lh-E-kt**r 


lh-E-k-t-E-* 


hectare 


*rb 


h-3-b 


herb, herbal etc. 


lh-i-frro 


lh-I-r-#o 


hero (Note 3) 


55 h'U-fyh-U-vz 


huf/huvz 


hoof, hooves 


lhaw-s-w-aj-f'-r-i 


lh-aws-w-If-ri 


housewifery 


lh'r-Iken 


lh- " -rl-k-'-n 


hurricane 


lh-aj-p-A-t-n-us 


lh-aj-p@-tlnju*z 


hypotenuse 


laj-d-H-(-Ik) 


11 d-M ( I k) 


idyll, idyllic 


Ml- * sir *!Iv 


lll*-s-fr-*t-l-v 


illustrative 


Umb*sM 
I-m-lp-a-s 


ll-nvb'-s-H 


imbecile 


lanvpAs 


impasse 


l-nv2pT-A-vHz-eS-*-n 


I-m-p*r**vaj-lz-e-S- 


*n improvisation 


la-n-dZ-*-n-u 


la-Z-*-nj-u 


ingenue 


lln-sM-'r 


Hn-sjU-1-* 


insular 


lln-s*lln 


11 nsj-UHn 


insulin 


I-n*!-*Tln-i-S'*n 


Int-'lnisajn 


internecine 


65 I n lve-g -1 


Inlv-i-g-M 


inveigle 


ldZ-agwAr 


ldZagju-- 


jaguar 



19 



US 6,738,738 B2 



20 



-continued 



-continued 



US-CPA 



uk*cpa 



Word 



k-"-ll-A-m-'-t-"r 
U-a-b-r * t-O ri 
lla!n 
Hi-Z-*r 

ll'f'-Z'A'D 

l-u-lt-E-n-*-n-t 

llajlak 

MUc'r 

DOg-dZItud 

m-*lk-A-b-*r 

nva-t-ln-e 

lm-E-d-l-s-*-n 

lm-e-l-e 

lmE-t*-l VdZI-st 

lnvE-z-*-n-i-n 

lm-I-dwaj-f-Vi 

mi-Uju 

lm-1-s-M-e-n-i 

lm-o-k-- 

lnvO-r-I-b-*-nd 

m-aw-D-t-n-li-i 

lm- -s-t-a-S 

n-AnS-llAnt 

ln-A-n-s-E-n-s 

ln-ug**-t 

0- lm-e-g-* 
lo-m-*-n 
lAs-p-ri 
lp*A-p-A 
p*lp-ri--k-* 
p-Alt-c 

ffzilAgn'nri 

lpTL-m-'-tU-r 

pT*-lm*i-r 

pT-*-lm-i-*r 

lp-u-nv" 

lk-waj-n-aj-n 

rl'lm-A-n-s'l-re-t 

lrE-n-"-s-A-n-s 

lrE*s-t**r*-n-t 

IrE-V-W 

Ls-e-n*t-(Name) 

s-e-lt-a-n-*-k 

s *-lv-Ant 

lsk-E-dZul 

ls-E-k-*-n-d-(v.) 

ISi-k 

lslu- 

ls-A-d-*r 

ts-pE-S-"-l-t-i- 

U-k-w*rl 

lst*r*p 

s-'lb-O-ltVn 

s-'-g-ldZ-E-s-t 

t-a-ltu 

tEm-p-'IrE-rl-H 
lTe-!-»- 
IT-IT- *r 
ltOrd-z 
It-reki- 
tra-lpi-z 
It-r-aw-m-* 
lW-aj k- * -l-'r 

1- " -n-dZ-M-e-t 
[ -n-lE*rIg- 

" -nlt-O-rd 
vcs 

v-*rlm-u-T 

v •*-ls-[s-Wud 

lvaj'!'*-m- ,i n 

Iv-o-k-s-w-a-g -"-a 

lv-A-lj***m 

Iwesfkot 

lh-u-p 

lw-I-g-w-A-m 

Iw'rstEd 

IrAT 



lkll*m*if 

I- **lb<@-r*-t-ri 
llatln 

II- E-Z-* 
lMe-z*-n 

I- E-f-lt-E-n-'n't 
ll aj l -k 
H-lk-j-u' 

II- @-n-dZ-I-t-j-u-d 
m'-lk-A-b-r 
lmattn-e 
lm-E-dsI-n 
lmE-le 

m-*lt-al-*-dZr-st 

lrrrE-t-s-'-rri-n 

ml-dlwlf-ri 

lm-H-j-* 

nvHs-E-l-'n-i 

lm-@-f 

lm'@Tl'b'*-n-d 

m-awnt'I-ln-i* 

m-*-ls-t-A*S 

ln-@-n-S-*-l***n-t 

ln-@-rrs-*-n*s 

lD*U-g'A 

lo-ml-g-* 
lo-mE-n 

l@'S*pTC 
p*lp-A 
lp-a-prl-k* 
lpa-t-e 

f'I-z-i-l@-ii-"-nvi 

lprE-mMS-* 

lprE-mi* 

lprE-mic* 

lp-j-u-m** 

k-wMn-i-u 

lrE-m'* , n'S't , r*e , t 

rl'lne-s-'-n-s 

lrE-s-t-'TCgJ-n-t 

rl-lv-a-H 

ts-**n-t-(Name) 

s-*-lt-a-n-I-k 

ls-a-v-*-n-t 

lS-E-dju-1 

s-I-lk-@-n-d-(v.) 

lS-ek 

lslaw 

Isold-* 

s-pE-S-Mall-fi 

lsk-wlr-l 

ls-fl-r*-p 

Is * -b-* l-t-* n ■ 

s-*-ldZ-E-s-t 

t-Mt-u 

lt*E*m-pT"-l-i 

1TM-' 

1DID* 

t-MwOd-z 

t-r-*lk-i * 

tr*lp*i'Z 

lt-r-O-nv* 

lt-rl-k-*-!'* 

1- -n-d-ju-lc't 

* - n -13 r[-g~ 

* n t " lw'Od 
v-A-z 

lv-3m*T 

vaj-ls-I-s-H-j-u-d 

lvI*t**-m-['0 

lf-o-k-s-v-A-g*-n 

lv-@-l-ju-m 

lw-Esklt 

1-wu-p 

lwlg-wa-m 

lwU-stld 

lr@T 



kilometre 

laboratory 

Latin (Note 2) 

leisure 

liaison 

lieutenant 

lilac 

liqueur 

longitude 

macabre 

matinee 

medicine (Note 4) 
melee 

metallurgist, -urgy 

mezzanine 

midwifery 

milieu 

miscellany 

mocha 

moribund 

mountaineer 

moustache 

nonchalant 

nonsense 

nougat 

omega 

omen 

osprey 

papa 

paprika 

pate 

physiognomy 

premature 

premier 

premiere 

puma 

quinine 

remonstrate 

renaissance 

restaurant 

reveille 

Saint 

satanic 

savant 

schedule 

second 

sheik 

slough 

solder 

specialty/speciality 

squirrel 

stirrup 

subaltern 

suggest 

tattoo 

temporarily 

theta 

thither 

towards 

trachea 

trapeze 

trauma 

tricolour 

undulate 

unerring 

untoward 

vase 

vermouth 

vicissitude 

vitamin 

Volkswagen 

volume 

waistcoat 

whoop 

wigwam 

worsted (cloth) 

wrath 



35 



40 



45 



50 



55 



60 



65 



US-CPA 


UK-CPA 


Word 


lzi 


lz-E*d 


z 


lZ'i-bT- 


lz-Ebr* 


zebra 


lz-inlT- 


lzE-n-IT 


zenith 


lz-i-#ro 


lz-i*-r#o 


zero (Note 3) 



10 Notes: (1) The prestige pronunciations of the past parti- 
ciple of the verb 'eat' are reversed in US and UK English, 
(2) Most US adjectival forms like '-inous' will elide the 
penultimate vowel, thus reducing the penultimate syllable to 
a syllabic consonant only. UK English retains the full 

is syllable. The same applies to disyllabic words, where the 
second syllable is '-tin, -ton, -din, -don', etc. (3) Compare 
'zero' for similar morphological splitting (cf. Wells, 1982b). 
(4) 'Medicine* normally has only two syllables in UK 
English. 

20 7. R-Dissimilation 

There are some cases in which the historical ft/ of US 
English has been dropped, accordingly some words are more 
similar than might otherwise be expected in US-UK pro- 
nunciations. 

25 





USCPA 


UK-CPA 


Word 




lg " -v-*-n-*r 


lg- " -v*-n-* 


governor 


30 


s-'-lp-raj-z 


s-*lp*raj-z 


surprise 



See also WELLS, J. C. (1982b) Accents of English 3; 
Beyond the British Isles. Cambridge, Cambridge University 
Press, p. 490. However, some examples provided by Wells in 
this category do not have non-rhotic US English versions 
published in his (1990) Pronunciation Dictionary. 
Accordingly, this pattern may be regarded as variable at 
least, and perhaps idiosyncratic. In one embodiment, it is 
implemented as a look up table of R-dissimilated words. In 
another embodiment, there is no special treatment of these 
words since the other rules provided above will handle them 
satisfactorily. 
E. Lexical Stress Shifts 

Many words have significant stress shifts that are also 
expression in different pronunciations, centered around the 
(non-)reduction of vowels. The following lists several 
examples of such stress shifts and exceptions. The appro- 
priate changes between stressed and unstressed vowels can 
then be made in accordance with the stress shift. 



US version 


UK version 


advertisement 


adlvertisement 


tally (v.) 


allly (v.) 


lapple sauce 


apple lsauce (Note 




1) 


alristocrat 


laristocrat 


balllet 


lballet (Note 2) 


1 ballyhoo 


bally 1 boo 


I Bangkok 


Banglkok (Note 3) 


Berlnard 


IBemard (Note 4) 


1 berserk 


berlserk 


cal Seine 


lea Seine 


lcapillary 


calpillary 


1 castrate 


casltrate 


lcigarette 


cigar lette 


co ml bat (v.) 


1 combat (v.) 



US 6,738,738 B2 



21 



22 



-continued 



US version 



UK version 



comlbatant 

comlbative 

comlpensatory 

comlplex (adj.) 

comlposite (adj.) 

lconstitutive 

del tail (a. and v.) 

ldictate (v.) 

disl locate 

1 donate 

ellongate 

lextant 

fronltier 

1 frustrated 

lgyrate 

balrass 

imporltune 

imlpregnate 

inlculcatc 

inlculpate 

in 1 filtrate 

1 locate 

1 magazine 

lmamma 

lmanganese 

1 margarine 

malssage 

Imayonnaise 

lmayorcss 

1 migrate 

obolletc 

olregano 

pallliasse 

[partisan 

pasltel 

paltina 

perlmit (n.) 

lplacate 

prelferably 

prollix 

1 prospect (v.) 

1 pro spec tor 

lprotest (v.) 

lpulsate 

purlport (v.) 

1 quadrate 

1 qua tens ry 

I refugee 

1 research 

1 rotate 

stallacmite 

stallagmite 

Ispectator 

1 solitaire 

lserraled 

lstrip -search 

lsubmarine 

1 tangerine 

Itestator 

1 truncate 

1 vacate 

valgary 

1 vibrate 

lwaste paper 



1 combatant 

lcombativc 

compenlsatory 

lcomplex (adj.) 

lcomposite (adj.) 

conlstitutive 

ldetail (n. and v.) 

dicltate (v.) 

ldislocate 

dolnate 

lelongate 

exltant 

1 frontier 

frulstrated 

gy Irate 

1 harass 

1 importune 

limpregnate 

linculcate 

linculpate 

linfiltrate 

lolcate 

magalzine 

mam 1 ma 

manga lnese 

margalrine 

1 massage 

mayonnlaise 

mayrless 

milgrate 

1 obsolete 

orelgano 

1 palliasse 

partilsan 

lpastel 

IpatLna 

lpermit (n.) 

plalcate 

lpreferably 

lprolix 

prolspect (v.) 

prolspector 

proltest (v.) 

pullsate 

lpurport (v.) 

quad Irate 

qualtenary 

refulgee 

re 1 search 

rot late 

lstalacmite 

lstalagmite 

specltator 

soliltaire 

serlraled 

strip- tsearch 

submaLrine 

tanlgerine 

test la tor 

trunlcate 

valcate 

lvagary 

vib Irate 

waste lpaper (Note 1) 



applique 

attache 
5 ballet 

baton 

beret 

blase 

bonhomie 

boudoir 
10 brassiere 

brochure 

buffet 

cabaret 

cachet 

cafe 
25 chagrin 

chalet 

chamois 



20 



25 



30 



50 



Notes: (1) Most nominal compounds of this type are stressed 
on the first syllable in US English, and the second in UK 
English, cf: "ice cream", "blue jeans", "cream cheese". 

(2) The great majority of "borrowed" French disyllabic 
words are stressed on the final syllable in US English, and 
the first in UK English, cf: 



chateau 

chauffeur 

cloisonne 

comedienne 

consomme 

coupe 

crochet 

croquet 

cure 

debris 

debut 

decollete 

decor 

demode 

denouement 

dressage 

elite 



ennui 

enpassant 

entree 

expose 

fiance(e) 

rUlel 

gateau 

gauche rie 

massage 

matinee 

melange 

melee 

menage 

metier 

milieu 

mirage^ 

neglige 



nouveau 

outre 

parquet 

passe 

pastille 

pate 

peignoir 

perfume 

Peugeot 

pierrot 

pique 

plateau 

pot-pourri 

precis 

protege 

puree 

ragout 



rapport 

rapprochement 

Renault 

retrousee 

risque 

sachet 

saute 

son et lumiere 

sorbet 

souffle 

souvenir 

tableau 

loupe 

trousseau 

valet 



A couple of notable exceptions to the fairly comprehensive 
list above are "debutante" and "resum6", which are both 
stressed on the first syllable in US English. While not 
commonly used in UK English ("curriculum vitae", or "c.v." 
being preferred, see above), "resume" is stressed on the first 
syllable. The lack of alternative words and full lexicalization 
of both probably accounts for the departure from the 
"French" pattern for US English. 

Also note that on the whole, UK English retains the 
accents in the orthographic versions of the items in the above 
list, whereas US English has dispensed with them, or uses 
them unpredictably (or wrongly), e.g. the various forms of 
"r£sum6" that are found commonly: "resume", "resum6", 
and "resume". See Trudgill and Hannah (1994, p.86) 

French personal names also receive stress on the first 
syllable in UK English, but the second in US English: 
Chopin, Degas, Monet, Renoir, etc. In one embodiment, 
35 these rules are applied using one or more tables of common 
borrowed French words and names. In another embodiment, 
the US pronunciation data 110 may include etymological 
information. In another embodiment, the etymological infor- 
mation is referenced from another source. 
40 (3) Many polysyllabic foreign place names and derived 
adjectives are stressed on the first syllable in US English, 
and the second in UK English, cf: Azores, Baghdad, Belfast, 
Beyreuth, Bucharest, Budapest, Byzantine, Calais, Caracas, 
Caribbean, Himalayas, Hong Kong, and Singapore. Curasao 
45 is an exception to this rule: US Curasao vs. UK 1 Curacao . 
(4) Many common English first names are stressed 
differently, with primary stress on the second syllable in US 
English, and the first in UK English, cf: Charlene, Doreen, 
Eileen, Eugene, Irene, Kathleen. 
F. Implementation Concerns 

The above rules can be described through one or more of 
regular expression substitutions, productions (e.g. this 
sequence of symbols becomes this), and/or rules (e.g. as 
described above in an appropriate form for computer 
55 implementation). 

The rules that do not require syllabification or determi- 
nation of parts of speech can generally be implemented with 
straightforward regular expressions matching on a mixture 
of the orthography and the corresponding transcription. 
60 Rules that require syllabification (as well as determination of 
stress, etymology, or part of speech) may also require regular 
expression matching on such a representation. 

Some care can be taken to ensure that multiple rules can 
be applied to a word or phrase, e.g. spelling of "maneuver" 
65 as "manoeuvre" in UK English, which would require pattern 
matching both for the trailing "-er" spelling and separately 
pattern matching for the Greek etymology and the replace- 



US 6,738,738 B2 

23 24 

ment of the lexical in US English with the original "oe" H. Conclusion 

spelling. Similar pattern matching can be done to correct the In some embodiments, processes of FIG. 1 can be imple- 

transcriptions, e.g. a flap, /!/, in a US English transcription mented using hardware based approaches, software based 

can simply be replaced by the appropriate A/ or /d/ according approaches, and/or a combination of the two. In some 

to the spelling. 5 embodiments, phonemic transcription, textual 

G Web Based Interface normalization, spelling normalization, lexical 

'individual application programmers that are developing normalization, phonemic conversion, and lexical stress 

voice applications may encounter several problems as they ***** are <*™d °ut wn & ODe or mor f ™ mp K i P ' 0gram ? 

try to internationalize their voice applications. Further, that ™ ^ J? ° DC °h com Pf r ^ ab j e m f 

J . . . . . . c "\ . . 4 , * n as CD-ROMs, floppy disks, or other media. In some 

despite their closeness as varieties of a smgle language the 30 cmbodimeDlS) co ^ rsion programs, script handling 

above tables and discussion show that US and UK English programs> spelling conver sion programs, and/or syllabifica- 

include significant variations in pronunciation, spelling and {{Qn programS) are included in one or more computer usable 

word choice. Individual developers may lack adequate media. 

resources, knowledge, and expertise to easily transform their Some embodiments of the invention are included in an 

application from US to UK English. is electromagnetic wave form. The electromagnetic waveform 

Accordingly, in some embodiments of the invention, a comprises information such as US-to-UK conversion 

web based interface is provided to allow users to submit programs, script handling programs, and/or syllabification 

scripts and grammars — either explicitly or implicitly — for programs. The electromagnetic waveform may include the 

conversion. In one embodiment, the submission is explicit, programs accessed over a network, 

e.g. US grammar 102 is submitted, using one or more 20 The foregoing description of various embodiments of the 

standard web based formats (e.g. URL of US grammar 102, invention has been presented for purposes of illustration and 

HTTP file upload of US grammar 102, direct entry of US description. It is not intended to limit the invention to the 

grammar 102 into an HTML form, etc.). In other precise forms disclosed. Many modifications and equivalent 

embodiments, the script or grammar is derived from analysis arrangements will be apparent, 

of application program code such as grammars in an appli- 25 Woal ^ claimed is: 

cation. For example, if an application programmer provides 1- A meth ° d of generating a British English phonemic 
the application program code for an application, the gram- transcription from a word and an American English phone- 
mars identified in the application could be treated as the US mic transcription of the word using a computer system: 
grammars 102. providing the word to a first computer program, the first 
If a web-based voice application development environ- 30 computer program transforming the spelling of the 
ment is provided (see for example, U.S. patent application ™™d according to British English spelling conven- 
Ser. No. 09/592,241 entitled "Method and Apparatus for tions; 

Zero-Footprint Phone Application Development", filed Jun. providing the word to a second computer program, the 

13, 2000, having inventors Kunins, Jeff C, et. al., and second computer program for lexically transforming 

assigned to the assignee of the present invention) then either 35 the word to a semantically equivalent word used in 

the implicit or explicit approach can be integrated with such British English and marking the word when no seman- 

an environment. Additionally, the US-to-UK conversion can tic equivalent is found and when the semantic equiva- 

be used in conjunction with web based phonemic transcrip- lent cannot be identified; and 

tion services (see for example, U.S. patent application Ser. applying phoneme conversion rules with a third computer 

No. 09/721,373, entitled "Automated Creation of Phonemic 40 program to transform an American English phonemic 

Variations", filed Nov. 22, 2000, having inventor Caroline G. transcription of the semantically equivalent word to the 

Henton). British English phonemic transcription. 

In one embodiment, the conversion service is provided as 2. The method of claim 1, wherein the first computer 

a paid -for service to developers. In another embodiment, program, the second computer program and the third com- 

devclopers are provided a limited quantity of free conver- 45 puter program operate without use of a dictionary of British 

sions (e.g. a certain number of grammars or lines of scripts). English or a comprehensive word list for British English 

In yet another embodiment, different prices are charged showing one or more of spelling, pronunciation, and word 

whether or not human intervention is required (e.g. fully stress. 

automatic conversions are free, but those requiring human 3. The method of claim 1, wherein the first computer 

intervention are charged). In some embodiments, developers 50 program uses a table comprising rules applicable to certain 

are not provided access to the resulting UK phonemic orthographic forms of the word to produce the British 

transcriptions. In some embodiments, grammars and scripts English Spelling. 

are batched across multiple developers, e.g. from their 4. The method of claim 1, wherein the first computer 

applications and grammars and those words that appear in at program includes an ad hoc table of a limited number of ad 

least N different locations are sent for conversion (possibly 55 hoc spelling differences between American and British 

without notification to the respective developers). English that are not easily expressed in meaningful ortho- 

Because of the competitive value of good pronunciations graphic transformation rules, the table not including more 

to the operator of a voice platform, in some embodiments, than one hundred (100) words. 

developers may only access the US/UK pronunciations for 5. The method of claim 1, wherein the second computer 

those words they have paid for transcription. Thus in such an 60 program includes an ad hoc table of ad hoc lexical differ- 

embodiment, if developer X has "Kodak" transcribed she ences between American and British English, the ad hoc 

can see the pronunciation she paid for (or requested/received table including lists of words specific to specific areas 

free). But developer Y cannot (unless she pays for transcrip- selected from the set of financial terms, units of measure, 

tion of the word, etc.) Similarly, if a developer of a time of musical notation, automotive terms, betting terms, botanical 

day application had "Greenwich" transcribed for US English 65 and zoological terms, food names, slang terms, cricket and 

they may be required to pay separately for the UK English sports terms, traffic terms, and/or miscellaneous common 

transcription. terms. 



US 6,7! 

25 

6. The method of claim 1, wherein when the second 
computer program transforms the word to a semantically 
equivalent word, the American English phonemic transcrip- 
tion is similarly replaced with an American English phone- 
mic transcription for the semantically equivalent word. 

7. The method of claim 1, wherein when the second 
program provides for interactive input and adjustment to 
marked words prior to the application of the third computer 
program. 

8. The method of claim 1, wherein the American English 
phonemic transcription is represented according to the Com- 
puter Phonetic Alphabet (CPA) for American English and 
wherein the British English phonemic transcription is rep- 
resented according to the CPA for British English. 

9. The method of claim 8, wherein the CPA representa- 
tions of phonemic transcriptions indicate primary and sec- 
ondary word stress. 

10. The method of claim 9, wherein the CPA representa- 
tion of primary and secondary word stress uses lower ASCII 
character symbols that are not used in either the CPA for 
American English or the CPA for British English. 

11. The method of claim 9, wherein the third computer 
program uses changed word stress between American 
English and British English to transform the American 
English phonemic transcription to the British English pho- 
nemic transcription. 

12. The method of claim 11, wherein changed word stress 
determined according to one or more rules and an ad hoc 
table of changed word stress. 

13. An apparatus for transforming a voice application 
prepared for American English speakers for use by British 
English speakers, the apparatus comprising: 

a first means for analyzing the voice application to 
identify one or more scripts and one or more grammars, 



18,738 B2 

26 

the one or more scripts corresponding to textual mate- 
rial for presentation by one of a text-to-speech system 
and a human voice talent, the one or more grammars 
corresponding to descriptions of words that the voice 
5 application must be capable of responding to in one or 
more states; 

a second means for automatically performing spelling and 
lexical normalization of the one or more scripts and the 
10 one or more grammars from American English to 
British English, the normalization producing one or 
more predetermined markings, the markings being 
indicative of words likely to require manual review; 

a third means for permitting manual review of the nor- 
35 malized scripts and grammars; and 

a fourth means for performing phonemic conversions for 
words in the scripts and the grammars from American 
English pronunciations to British English pronuncia- 
tions. 

20 

14. The apparatus of claim 13, further comprising a fifth 
means for generating an augmented script, the augmented 
script including British English pronunciation for at least 
one word. 

25 15. The apparatus of claim 13, further comprising a fifth 
means for providing a web based interface to the first, 
second, third and fourth means and wherein the third means 
is adapted to perform in a web based environment. 

16. The apparatus of claim 13, further comprising a fifth 

30 means for automatically applying the first, second, third, and 
fourth means to a voice application without an explicit 
request to transform the voice application. 

* * * * * 



